Add out of context compaction test via error proxy#5805
Conversation
* 'main' of github.com:block/goose: (49 commits) fixing video embed (#5171) chore: clean up random unused files (#5166) fix: adjust download_cli.sh to tolerate no OS variable (#5169) mcp tutorial page for firecrawl (#5152) Remove orphaned tool calls before compaction (#5059) feat: add copy as markdown button to documentation pages (#5158) chore: include vendored node executable (#5160) remove extra whitespace from message (#5159) Clear deeplinks after use (#5128) Revert "Fix gpt-5 input context limit (#4619)" (#5135) fix: missing cmake and protobuf for windows build, deduplicate sh/pws… (#5028) Fix bedrock tool input schema (#5064) Add self-test recipe for goose validation (#5111) fix: modifies openai request logic for reasoning models (#4221) (#4294) Fix race condition threat when set_param and set_secret of c… (#5109) Clean room implementation of the chat process (#5079) Bump rmcp (#5096) set version in an env variable for testing (#5100) fix : enhance fuzzy file search in goose desktop (#5071) Make async (#5126) ...
* 'main' of github.com:block/goose: Declarative providers (#5084) adding youtube link to firecrawl mcp tutorial, merge after 9am Eastern Oct 15 (#5173) Ollama integration: modified default model + added models (#5153) Fix codex subagent configuration in documentation (#5180) fix: include apple silicon build of the desktop app in build artifacts (#5174)
* 'main' of github.com:block/goose: (132 commits) Fix/icon ii (#5413) Enable runtime access to provider name (#5399) fix: ensure trailing newline in files created by `text_editor` tool (#5336) docs: September 2025 Community All-Stars (#5411) make supports_cache_control async to avoid block in place (#5362) Send all the logs we output (#5363) Recipe variables (#5365) Feat/add mermaid chart rendering (#5377) Set up Datadog metrics for prompt injection detection (#5385) fix: restore --resume functionality for most recent session (#5401) Gemini again (#5390) docs(prompt-library): add github-issue-labeler intermediate prompt (#5374) docs: add Linux and Windows paths to uninstall section (#5371) fix: --session-id shouldn't work without --resume, but --name should (#5360) Auto-compact Threshold UI improvements (#5354) Filter preserved user messages to be text only. (#5391) include sessionId in tool request (#5394) feat: add PR Impact Analyzer prompt (#5375) docs: add blog post on configuring goose for team environments (#5380) migrating back with new chatrecall non underscore name (#5223) ...
* 'main' of github.com:block/goose: (61 commits) [Autovisualiser] remove unnecessary content from mermaid HTML template (#5505) Improve subagents docs (#5484) FIX: prefer linux in WSL and add INSTALL_OS override for CLI (#5215) Propagate session ID in LLM and MCP requests (#5165) feat: YT Short for Canva MCP + goose (#5495) Change Recipes Test Script (#5457) Goose recover (#5450) don't start the default provider (#5351) keep the order of keys in config.yaml (#5468) Removed drafts and agentIsReady in ChatInput (#5366) nextcamp - fix session resume when navigating back to chat in sidebar (#5370) feat/fix: set optional config params, and don't overwrite unset secrets (#5325) Stringly typed config (#5463) Fix: Compaction client <-> server sync (#5481) docs: recipe activity parameter substitution (#5462) only run fork on branch PRs (#5461) docs: video on goose with apify mcp (#5472) Clear windows and fix build failure (#5452) Add menu option for setting window always on top (#5429) Delete environment variable (#5479) ...
* 'main' of github.com:block/goose: (21 commits) Manual compaction counting fix + cli cleanup (#5480) chore(deps): bump prismjs and react-syntax-highlighter in /ui/desktop (#5549) fix: remove qwen3-coder from provider/mcp smoke tests (#5551) fix: do not build unsigned desktop app bundles on every PR in ci. add manual option. (#5550) fix: update Husky prepare script to v9 format (#5522) Fix 404 for responsible coding guide (#5543) fix hermit `text file busy` issues on linux (#5372) Fix image processing (#5544) docs: AI attribution for PRs (#5547) chore(tests/mcp): testing for MCP sampling (#5456) docs: adding HOWTOAI.md (#5533) added configuration content, also added signoff, fix merging issue with another commit by creating a clean branch. removed and closed commits that caused signoff issues. (#5519) Fixes Gemini API parse issue by converting nullable type arrays to single types in tool schemas (#5530) Troubleshooting diagnostics doc (#5526) fix link to Ollama FAQ (#5531) docs: remove speech-mcp (#5514) fix: adds ProviderRetry to openai provider (#5518) docs: extensions directory minor updates (#5466) Docs/json recipe support (#5492) docs: recipe buttons (#5507) ...
* 'main' of github.com:block/goose: Sessions required (#5548) feat: add grouped extension loading notification (#5529) we should run this on main and also test open models at least via ope… (#5556) info: print location of sessions.db via goose info (#5557) chore: remove yarn usage from documentation (#5555) cli: adjust default theme to address #1905 (#5552)
* 'main' of github.com:block/goose: (125 commits) Document Mistral AI provider (#5799) docs: Add Community Stars recipe script and txt file (#5776) chore: incorporate LF feedback (#5787) docs: quick launcher (#5779) Bump auto scroll threshold (#5738) fix: add one-time cleanup for linux hermit locking issues (#5742) Don't show update tray icon if GOOSE_VERSION is set (#5750) fix: get win node path from registry (#5731) Handle spaces in extension names also (#5770) Remove empty settings card for Scheduling Engine (#5771) fix windows cli build (#5768) fix: Implement a CredentialStore for auth (#5741) blog post: How to Successfully Migrate Your App with an AI Agent (#5762) Simplify finding `goosed` (#5739) More time for goosed (#5746) Match lower case (#5763) scan recipe for security when saving recipe (#5747) feat: trying grok for live test (#5732) Platform Extension MOIM (Minus One Info Message) (#5027) docs: remove hackathon banner (#5756) ...
| if skip_backoff { | ||
| tracing::info!("Skipping backoff due to GOOSE_PROVIDER_SKIP_BACKOFF"); | ||
| } else { | ||
| tracing::info!("Backing off for {:?} before retry", delay); |
There was a problem hiding this comment.
This thing is a big pain when testing with the proxy; need to be very precise and knowledgable about how many times it will backoff.
There was a problem hiding this comment.
makes sense. If you want, you could turn it into a real config variable, but that's probably overkill
…xt-test * 'main' of github.com:block/goose: chore: Add Adrian Cole to Maintainers (#5815) [MCP-UI] Proxy and Better Message Handling (#5487) Release 1.15.0 Document New Window menu in macOS dock (#5811) Catch cron errors (#5707) feat/fix Re-enabled WAL with commit transaction management (Linux Verification Requested) (#5793) chore: remove autopilot experimental feature (#5781) Read paths from an interactive & login shell (#5774) docs: acp clients (#5800)
There was a problem hiding this comment.
Pull Request Overview
This PR adds a third smoke test to validate that Goose properly handles out-of-context errors from providers by triggering compaction. The test uses a new error proxy tool to simulate context-length errors.
Key changes:
- Added TEST 3 to
test_compaction.shthat uses the provider error proxy to inject context-length errors and verify compaction is triggered - Enhanced
proxy.pywith--modeand--no-stdinCLI arguments for automated testing scenarios - Added
GOOSE_PROVIDER_SKIP_BACKOFFenvironment variable support to speed up retry testing - Updated CI workflow to install Python 3.12 and
uvfor running the error proxy
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/test_compaction.sh | Adds TEST 3 for out-of-context error compaction using error proxy, removes duplicate jq check |
| scripts/provider-error-proxy/proxy.py | Refactors command parsing into shared function, adds CLI args for automated mode |
| scripts/provider-error-proxy/README.md | Documents new CLI arguments and automated testing usage patterns |
| crates/goose/src/providers/retry.rs | Adds GOOSE_PROVIDER_SKIP_BACKOFF env var to skip backoff delays in tests |
| .github/workflows/pr-smoke-test.yml | Adds Python 3.12 and uv setup steps before compaction tests |
| - name: Set up Python (for error proxy) | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.12' |
There was a problem hiding this comment.
do you still need setup-python if you are using the uv action?
There was a problem hiding this comment.
Yeah tried the uv first and it didn't seem to come with python.
| if skip_backoff { | ||
| tracing::info!("Skipping backoff due to GOOSE_PROVIDER_SKIP_BACKOFF"); | ||
| } else { | ||
| tracing::info!("Backing off for {:?} before retry", delay); |
There was a problem hiding this comment.
makes sense. If you want, you could turn it into a real config variable, but that's probably overkill
scripts/test_compaction.sh
Outdated
|
|
||
| # Pre-install proxy dependencies (so first run doesn't take forever) | ||
| echo "Installing proxy dependencies..." | ||
| # Force UV to use public PyPI (override any local/corporate mirrors) |
There was a problem hiding this comment.
Is this OK/intended. I wasn't able to install the deps with the block artifactory link, had something like a 404.
There was a problem hiding this comment.
oh yeah I can see how you'd get there. It seems unnecessary if this is primarily going to be in run CI, but it's also mostly harmless
There was a problem hiding this comment.
Was getting that issue in CI! Not sure how it was actually getting injected but lets just go with this.
| echo "✗ FAILED: Could not install proxy dependencies" | ||
| echo "Setup log:" | ||
| cat "$PROXY_SETUP_LOG" | ||
| RESULTS+=("✗ Out-of-Context Error (dependency install failed)") |
scripts/test_compaction.sh
Outdated
| echo "Proxy log:" | ||
| cat "$PROXY_LOG" | ||
| kill $PROXY_PID 2>/dev/null || true | ||
| RESULTS+=("✗ Out-of-Context Error (proxy failed)") |
There was a problem hiding this comment.
why do these say Out-of-Context Error?
There was a problem hiding this comment.
Added the word 'Out of context test Error' to all of these to clarify.
* main: (48 commits) [fix] generic check for gemini compat (#5842) Add scheduler to diagnostics (#5849) Cors and token (#5850) fix sessions coming back with empty messages (#5841) markdown export from URL (#5830) Next camp refactor live (#5706) Add out of context compaction test via error proxy (#5805) fix: Add backward compatibility for conversationCompacted message type (#5819) Add /agent/stop endpoint, make max active agents configurable (#5826) Handle 404s (#5791) Persist provider name and model config in the session (#5419) Comment out the flaky mcp callers (#5827) Slash commands (#5718) fix: remove setx calls to not permanently edit the windows shell PATH (#5821) fix: Parse maas models for gcp vertex provider (#5816) fix: support Gemini 3's thought signatures (#5806) chore: Add Adrian Cole to Maintainers (#5815) [MCP-UI] Proxy and Better Message Handling (#5487) Release 1.15.0 Document New Window menu in macOS dock (#5811) ...
Signed-off-by: Sai Karthik <kskarthik@disroot.org>
Signed-off-by: Blair Allan <Blairallan@icloud.com>
Adds an additional smoke test around compaction occurring after OutOfContext errors.
Also gets the proxy + python working in our CI env. Can potentially use the proxy for some kind of rate limit -> continue tests as well.